Stela : on - Demand Elasticity in Distributed Data Stream Processing Systems

نویسندگان

  • Indranil Gupta
  • Boyang Peng
  • Boyang Jerry Peng
چکیده

Big data is characterized by volume and velocity [24], and recently several real-time stream processing systems have emerged to combat this challenge. These systems process streams of data in real time and computational results. However, current popular data stream processing systems lack the ability to scale out and scale in (i.e., increase or decrease the number of machines or VMs allocated to the application) efficiently and unintrusively when requested by the user on demand. In order to scale out/in, a critical problem that needs to be solved is to determine which operator(s) of the stream processing application need to be given more resources or taken resources away from, in order to maximize the application throughput. We do so by presenting a novel metric called "Expected Throughput Percentage" (ETP). ETP takes into account not only congested elements of the stream processing application but also their effect on downstream elements and on the overall application throughput. Next, we show how our new system, called Stela (STream processing ELAsticity), incorporates ETP in its scheduling strategy. Stela enables scale out and scale in operations on demand, and achieves the twin goals of optimizing post-scaling throughput and minimizing interference to throughput during the scaling out/in. We have integrated the implementation of Stela into Apache Storm [27], a popular data stream processing system. We conducted experiments on Stela using a set of micro benchmark topologies as well as two topologies from Yahoo! Inc. Our experiment results shows Stela achieves iii 45% to 120% higher post scale throughput comparing to default Storm scheduler performing scale out operations, and 40% to 500% of throughput improvement comparing to the default scheduler during scale in stage. This work is a joint project with Master student Boyang Peng [1]. iv For Mum and Dad, who give me all they have. v ACKNOWLEDGMENTS I would like to thank my advisor, Indranil Gupta, to provide invaluable support, inspirations, and guidance for my research during my study. I am also very grateful to him for his millions of suggestions and corrections to help me to improve my writing skill. I would like to thank Boyang Jerry Peng for his collaboration in this project [1]. This work will not be possible without them. I would also like to express my sincere gratitude to all former and current members of Distributed Protocols Research Group (DPRG), for their constant support like a family. Working with them has …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stela : on - Demand Elasticity in Distributed Data

Big data is characterized by volume and velocity [24], and recently several real-time stream processing systems have emerged to combat this challenge. These systems process streams of data in real time and computational results. However, current popular data stream processing systems lack the ability to scale out and scale in (i.e., increase or decrease the number of machines or VMs allocated t...

متن کامل

Elasticity and Resource Aware

The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems, lacks many important and desired features. One important feature is elasticity with clusters running Storm, i.e. change the cluster size on de...

متن کامل

Efficient Migration of Very Large Distributed State for Scalable Stream Processing

Any scalable stream data processing engine must handle the dynamic nature of data streams and it must quickly react to every fluctuation in the data rate. Many systems successfully address data rate spikes through resource elasticity and dynamic load balancing. The main challenge is the presence of stateful operators because their internal, mutable state must be scaled out while assuring fault-...

متن کامل

Distributed data stream processing and edge computing: A survey on resource elasticity and future directions

Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. More recently, a...

متن کامل

FUGU: Elastic Data Stream Processing with Latency Constraints

Elasticity describes the ability of any distributed system to scale to a varying number of hosts in response to workload changes. It has become a mandatory architectural property for state of the art cloud-based data stream processing systems, as it allows treatment of unexpected load peaks and cost-efficient execution at the same time. Although such systems scale automatically, the user still ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015